A system for voice conversion based on adaptive filtering and line spectral frequency distance optimization for text-to-speech synthesis

نویسندگان

  • Özgül Salor-Durna
  • Mübeccel Demirekler
  • Bryan L. Pellom
چکیده

This paper proposes a new voice conversion algorithm that modifies the source speaker’s speech to sound as if produced by a target speaker. To date, most approaches for speaker transformation are based on mapping functions or codebooks. We propose a linear filtering based approach to the problem of mapping the spectral parameters of one speaker to those of the other. In the proposed method, the transformation is performed by filtering the source speaker’s Line Spectral Pair (LSP) frequencies to obtain the LSP frequency estimates of the target speaker. Speech signal is timealigned into a sequence of HMM states. The filters are designed for each HMM state using the aligned data. We consider two methods for spectral conversion. A linear transformation for the LSP’s was obtained using the adaptive steepest gradient descent approach. Mean values of LSP’s are adjusted to match those of the target speaker. In order to prevent the LSP vectors from resulting in unstable vocal tract filters, weighted least square estimation is used. This approach optimizes differences between source and target LSP’s. Weights are inverses of the source LSP variances. This approach is integrated into a Time Domain Pitch Synchronous Overlap and Add (TD-PSOLA) analysis-synthesis framework. The algorithm is objectively evaluated using a distance measure based on the log-likelihood ratio of observing the input speech, given Gaussian mixture speaker models for both the source and the target voice. Results using the Gaussian mixture model formulated criteria demonstrate consistent transformation using a 5 speaker database. The algorithm offers promise for rapidly adapting textto-speech systems to new voices.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Codec integrated voice conversion for embedded speech synthesis

Voice conversion technologies transform individual characteristics of speech patterns while preserving the original content, and can be widely used in speech processing. Considering limited system resources, in particular, of embedded concatenative speech synthesis, voice conversion may reduce the memory consumption of the acoustic database. Voice conversion enables the intra-gender or cross-ge...

متن کامل

Speech Enhancement by Modified Convex Combination of Fractional Adaptive Filtering

This paper presents new adaptive filtering techniques used in speech enhancement system. Adaptive filtering schemes are subjected to different trade-offs regarding their steady-state misadjustment, speed of convergence, and tracking performance. Fractional Least-Mean-Square (FLMS) is a new adaptive algorithm which has better performance than the conventional LMS algorithm. Normalization of LMS ...

متن کامل

Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation

This paper describes a novel approach based on voice conversion (VC) to speaker-adaptive speech synthesis for speech-tospeech translation. Voice quality of translated speech in an output language is usually different from that of an input speaker of the translation system since a text-to-speech system is developed with another speaker’s voices in the output language. To render the input speaker...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003